fix(docs-mcp): recursively crawl and register nested llms.txt resources#2317
Conversation
🦋 Changeset detectedLatest commit: 7c2e3a9 The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
✅ Files skipped from review due to trivial changes (1)
📝 WalkthroughWalkthroughConverts the docs MCP server registrar into an async crawler that recursively discovers and registers nested ChangesChangeset Entry
Docs MCP Server crawler
Sequence DiagramsequenceDiagram
participant Main
participant Crawler
participant HTTP
participant MCP
Main->>Crawler: start(baseURL, fromMarkdownText, visited)
Crawler->>HTTP: fetch(link.url)
HTTP-->>Crawler: response
alt link ends with llms.txt & not visited
Crawler->>MCP: register lynx-docs://... (markdown)
Crawler->>Crawler: mark visited and recurse with nested markdown
else normal resource
Crawler->>MCP: guarded register resource (on HTTP OK)
end
Crawler-->>Main: finished
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
packages/mcp-servers/docs-mcp-server/main.ts (1)
122-130: Consider making the resource factory consistent with non-nested resources.The factory here returns cached
nestedMarkdowncaptured at registration time, while regular resources (lines 157-171) use an async factory that fetches fresh content on each read. This creates behavioral inconsistency: nested index resources return startup-time content, while other resources reflect current server content.If this caching is intentional (avoiding redundant fetches for stable index files), consider adding a brief comment to document the design decision.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/mcp-servers/docs-mcp-server/main.ts` around lines 122 - 130, The resource factory currently returns the cached nestedMarkdown captured at registration (the factory returning () => ({ contents: [{ uri: `lynx-docs://${strippedUrl}`, text: nestedMarkdown, mimeType: 'text/markdown' }] })), which is inconsistent with the other resource factories that are async and fetch fresh content on each read; either change this factory to an async factory that computes/fetches the current nested markdown on each invocation (e.g., async () => ({ contents: [{ uri: `lynx-docs://${strippedUrl}`, text: await computeNestedMarkdown(...), mimeType: 'text/markdown' }] })) to match the behavior of the resources at lines 157-171, or if startup caching is intentional, add a short comment above this factory referencing nestedMarkdown and explaining that it is intentionally captured at registration to avoid repeated fetches.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@packages/mcp-servers/docs-mcp-server/main.ts`:
- Around line 122-130: The resource factory currently returns the cached
nestedMarkdown captured at registration (the factory returning () => ({
contents: [{ uri: `lynx-docs://${strippedUrl}`, text: nestedMarkdown, mimeType:
'text/markdown' }] })), which is inconsistent with the other resource factories
that are async and fetch fresh content on each read; either change this factory
to an async factory that computes/fetches the current nested markdown on each
invocation (e.g., async () => ({ contents: [{ uri: `lynx-docs://${strippedUrl}`,
text: await computeNestedMarkdown(...), mimeType: 'text/markdown' }] })) to
match the behavior of the resources at lines 157-171, or if startup caching is
intentional, add a short comment above this factory referencing nestedMarkdown
and explaining that it is intentionally captured at registration to avoid
repeated fetches.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: e04dd5dd-f09e-45c9-80f1-5621108d9111
📒 Files selected for processing (2)
.changeset/fix-recursive-docs-mcp.mdpackages/mcp-servers/docs-mcp-server/main.ts
There was a problem hiding this comment.
Pull request overview
This PR updates the docs MCP server to recursively discover and register resources referenced by nested llms.txt index files, preventing “Resource not found” errors when documentation is organized under sub-indexes.
Changes:
- Implement recursive crawling of
llms.txtlinks to register nested indexes and their referenced resources. - Add HTTP status handling for nested index fetches and for resource fetches during reads.
- Add a changeset to publish a patch release for
@lynx-js/docs-mcp-server.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| packages/mcp-servers/docs-mcp-server/main.ts | Adds recursive crawling/registration of nested llms.txt resources and improves fetch error handling. |
| .changeset/fix-recursive-docs-mcp.md | Patch changeset entry for the docs MCP server. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Web Explorer#9662 Bundle Size — 901.27KiB (0%).7c2e3a9(current) vs 1e1257e main#9648(baseline) Bundle metrics
Bundle size by type
|
| Current #9662 |
Baseline #9648 |
|
|---|---|---|
497KiB |
497KiB |
|
402.06KiB |
402.06KiB |
|
2.22KiB |
2.22KiB |
Bundle analysis report Branch fix/docs-mcp-recursion Project dashboard
Generated by RelativeCI Documentation Report issue
80a7953 to
7b4ac27
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/mcp-servers/docs-mcp-server/main.ts`:
- Around line 49-54: The crawler in crawlAndRegisterResources and related blocks
resolves nested link.url against the original root baseURL instead of the
current llms.txt location, so relative links like ./foo.md or ../bar/llms.txt
break; fix by resolving each link against the current resource's URL before
using or recursing: compute a resolved URL using new URL(link.url,
currentResourceBase) where currentResourceBase is the URL of the llms.txt (or
the full URL you just fetched/parsed) rather than the passed-in root baseURL,
use that resolved URL for fetching/registering and pass its origin/path (or the
resolved URL) as the base for recursive calls (update occurrences in
crawlAndRegisterResources and the other blocks mentioned: 99-138, 157-171,
222-228).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 2b2081ec-7d5c-4d04-8e48-15dd4bbf79a3
📒 Files selected for processing (2)
.changeset/fix-recursive-docs-mcp.mdpackages/mcp-servers/docs-mcp-server/main.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- .changeset/fix-recursive-docs-mcp.md
Merging this PR will degrade performance by 5.37%
|
| Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|
| ❌ | 002-hello-reactLynx-destroyBackground |
863.5 µs | 912.4 µs | -5.37% |
| ⚡ | 008-many-use-state-destroyBackground |
9.5 ms | 8 ms | +19.3% |
Comparing fix/docs-mcp-recursion (7c2e3a9) with main (460ddbd)
Footnotes
-
26 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
React MTF Example#1220 Bundle Size — 206.65KiB (-0.39%).7c2e3a9(current) vs 1e1257e main#1206(baseline) Bundle metrics
Bundle size by type
Bundle analysis report Branch fix/docs-mcp-recursion Project dashboard Generated by RelativeCI Documentation Report issue |
React External#1202 Bundle Size — 690.27KiB (-0.4%).7c2e3a9(current) vs 1e1257e main#1188(baseline) Bundle metrics
Bundle size by type
Bundle analysis report Branch fix/docs-mcp-recursion Project dashboard Generated by RelativeCI Documentation Report issue |
React Example#8089 Bundle Size — 235.77KiB (-0.31%).7c2e3a9(current) vs 1e1257e main#8075(baseline) Bundle metrics
Bundle size by type
Bundle analysis report Branch fix/docs-mcp-recursion Project dashboard Generated by RelativeCI Documentation Report issue |
6877936 to
7c2e3a9
Compare
React Example with Element Template#355 Bundle Size — 197.79KiB (0%).7c2e3a9(current) vs 1e1257e main#341(baseline) Bundle metrics
Bundle size by type
|
| Current #355 |
Baseline #341 |
|
|---|---|---|
145.76KiB |
145.76KiB |
|
52.03KiB |
52.03KiB |
Bundle analysis report Branch fix/docs-mcp-recursion Project dashboard
Generated by RelativeCI Documentation Report issue
This PR fixes an issue where nested documentation resources (linked from sub-indexes like
api/llms.txt) were not being registered by the MCP server, causing 'Resource not found' errors.Changes:
llms.txtfiles inmain.ts.Summary by CodeRabbit
Bug Fixes
Chores